25 research outputs found

    Splitting Arabic Texts into Elementary Discourse Units

    Get PDF
    International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system

    A control process model of code-switching

    No full text
    Code-switching (CS) is central to many bilingual communities and, though linguistic and sociolinguistic research has characterised different types of code-switches (alternations, insertions, dense CS), the cognitive control processes (CPs) that mediate them are not well understood. A key issue is how during CS speakers produce the right words in the right order. In speech, serial order emerges from a speech plan in which items are represented in parallel. We propose that entry into the mechanism for speech planning (a competitive queuing mechanism) is governed by CPs best suited to the particular types of code-switches. Language task schemas external to the language network govern access. In CS, they are coordinated cooperatively and operate in a coupled or in an open control mode. The former permits alternations and insertions whereas the latter is required for dense CS. We explore predictions of this CP model and its implications for CS research
    corecore